文摘|大数据如何造成虚假信心(How Big Data Creates False Confidence)
第2期复杂性文摘翻译新鲜出炉啦!!!
为满足各位同学想要参与翻译的强烈愿望,第二期复杂性文摘翻译我们增加了摘要数量,一共选择了39篇文章摘要,出自复杂性文摘5月刊。历经两个星期,顺利完成了全部的翻译和审校工作。
特别感谢傅渥成老师、唐璐老师、张江老师为此次翻译做了细致的审校工作!
特别恭喜五位童鞋获得了“集智优秀译友”荣誉。他们分别是:
“基于热点定价策略来缓解城市拥堵问题(Decongestion of urban areas with hotspot-pricing)”
译者 HSAH_CHOI
“从成功中理清表现(Untangling performance from success)”
译者 jeffersonchou
“开放创新2.0的十二条准则 (Twelve principles for open innovation 2.0)”
译者 胡鹏博
“一只同时生活在两个盒子中的薛定谔猫(A Schrödinger cat living in two boxes)”
译者 王继康
“贫穷与表观遗传变异及精神疾病的关系(Poverty linked to epigenetic changes and mental illness)”
译者 李宇峰
今天的分享为翻译的5-10篇,如下:
5、大数据如何造成虚假信心(How Big Data Creates False Confidence)
原文链接:
http://nautil.us/blog/how-big-data-creates-false-confidence
(Translated by -阎赫)
Although no one can quite agree how to define it, thegeneral idea is to find datasets so enormous that they can reveal patterns invisibleto conventional inquiry. The data are often generated by millions of real-worlduser actions, such as tweets or credit-card purchases, and they can takethousands of computers to collect, store, and analyze. To many companies andresearchers, though, the investment is worth it because the patterns can unlockinformation about anything from genetic disorders to tomorrow’s stock prices. But there’s a problem: It’s tempting to think that with such anincredible volume of data behind them, studies relying on big data couldn’t bewrong. But the bigness of the data can imbue the results with a false sense ofcertainty. Many of them are probably bogus—and the reasons why should give uspause about any research that blindly trusts big data.
虽然对如何定义大数据,没有达成十分一致意见,但是,其总体思路是找到足够庞大的数据集,以揭示常规调查不可见的模式。这些数据常常由数以百万计现实生活中的用户行为所产生,如发微博或信用卡购物等,人们需要动用数以千计的计算机收集,储存和分析这些数据。虽然如此,对于许多公司和研究人员而言,这些投资还是值得的,因为这个模式能够破解从家族遗传病到明天的股市价格相关信息的任何事情。但有个问题:它使人自然地认为,由这些研究背后惊人的数据量,所以这些依托于如此大数据的研究是不会有错误的。但巨大的数据可能会以虚假的确定感影响研究结果。许多研究结果可能都是虚假的 – 而这也是为什么使我们不再盲目相信大数据研究的理由。
6、非神经生物体中的习惯性学习:来自黏菌的证据 (Habituation in non-neuralorganisms: evidence from slime moulds)
原文链接:
http://dx.doi.org/10.1098/rspb.2016.0446
(Translated by - 高德华,edited by 傅渥成)
Learning, defined as a change in behaviour evoked byexperience, has hitherto been investigated almost exclusively in multicellularneural organisms. Evidence for learning in non-neural multicellular organismsis scant, and only a few unequivocal reports of learning have been described insingle-celled organisms. Here we demonstrate habituation, an unmistakable formof learning, in the non-neural organism Physarumpolycephalum. In ourexperiment, using chemotaxis as the behavioural output and quinine or caffeineas the stimulus, we showed that P. polycephalum learnt to ignore quinine orcaffeine when the stimuli were repeated, but responded again when the stimuluswas withheld for a certain time. Our results meet the principle criteria thathave been used to demonstrate habituation: responsiveness decline andspontaneous recovery. To distinguish habituation from sensory adaptation ormotor fatigue, we also show stimulus specificity. Our results point to thediversity of organisms lacking neurons, which likely display a hithertounrecognized capacity for learning, and suggest that slime moulds may be anideal model system in which to investigate fundamental mechanisms underlyinglearning processes. Besides, documenting learning in non-neural organisms suchas slime moulds is centrally important to a comprehensive, phylogeneticunderstanding of when and where in the tree of life the earliest manifestationsof learning evolved.
学习,作为由经验所诱发的一种行为上的变化,迄今已经在几乎所有的多细胞神经生物中得到研究。但是,有关多细胞非神经生物体学习的证据仍存在不足,而只有少数确切的报告对单细胞生物体中的学习作了描述。本文中,我们展示了多头绒泡菌(Physarumpolycephalum)这一非神经生物体中的习惯性行为,并将其看作是一种毋容置疑的学习形式。在我们的实验中,通过将趋化性(chemotaxis)作为行为的输出并且将奎宁(quinine)或咖啡因(caffeine)作为刺激物,我们发现:当刺激重复出现时,多头绒泡菌学会了忽略奎宁或咖啡因的作用。但是当这些刺激被抑制特定的时间之后,它们又能够重新作出反应。我们的研究结果符合反应性下降(responsiveness decline)和自然恢复(spontaneous recovery)等被运用于证实适应性的原则标准。为了将习惯性学习与感觉适应或运动性疲劳相区分,我们也对刺激的特异性(stimulus specificity)进行了研究。相关的结果显示:缺乏神经元的生物体的多样性,可能表现出一种至今仍未被识别的学习能力;并且认为,黏菌可能是用来研究学习过程的基本机制的理想模型系统。此外,对诸如黏菌之类的非神经生物体学习机制的揭示,对于全面、系统地理解生命之树(tree of life)中的学习的早期表现形式在何时何地发生的演化也至关重要。
7、众包解决城里的罗宾汉效应(Crowdsourcing the Robin Hood effect in cities)
原文链接:
http://arxiv.org/abs/1604.08394
(Translated by - F7,edited by 傅渥成)
Socioeconomic inequalities in cities are embedded inspace and result in neighborhood effects, whose harmful consequences haveproved very hard to counterbalance efficiently by planning policies alone.Considering redistribution of money flows as a first step toward improvedspatial equity, we study a bottom-up approach that would rely on a slightevolution of shopping mobility practices. Building on a database of anonymizedcredit card transactions in Madrid and Barcelona, we quantify the mobilityeffort required to reach a reference situation where commercial income isevenly shared among neighborhoods. The redirections of shopping trips preservekey properties of human mobility, including travel distances. Surprisingly, forboth cities only a small fraction (∼5%) of trips need to be altered to reach equity situations,improving even other sustainability indicators. The method could be implementedin mobile applications that would assist individuals in reshaping theirshopping practices, to promote the spatial redistribution of opportunities inthe city.
城市中的社会经济学的不平等内嵌于地理空间之中,并将导致邻里效应(neighborhood effects),其弊端已经被证明单用计划性政策是很难去有效平衡解决的。考虑到资金流的再分配可以作为改善不平等的的第一步,我们研究提出了一个自下而上的解决方法,该方法需要一些购物流动性上的改变。通过建立马德里和巴塞罗那的匿名信用卡交易数据库,我们量化了达到一个在邻里之间商业收入比较均等的参考水平所需要的移动耗费量。购物出行的重新定向保存了人类移动的一些关键性质,包括移动的距离。令人惊奇的是,这两个城市都只需要非常少(大约5%)的移动改变就能达到较为平等的水平,同时甚至也改进了其他的可持续性指标。该方法可以在移动应用程序中实现,并且将有助于个人重塑购物习惯,以促进城市内的机会在空间上的重新分布。
8、综合复杂网络和数据挖掘:原因和方法 (Combining complex networksand data mining: why and how)
原文链接:
http://arxiv.org/abs/1604.08816
(Translated by 蔡嘉文, edited by 傅渥成)
The increasing power of computer technology does notdispense with the need to extract meaningful information out of data sets ofever growing size, and indeed typically exacerbates the complexity of thistask. To tackle this general problem, two methods have emerged, atchronologically different times, that are now commonly used in the scientificcommunity: data mining and complex network theory. Not only do complex networkanalysis and data mining share the same general goal, that of extractinginformation from complex systems to ultimately create a new compactquantifiable representation, but they also often address similar problems too.In the face of that, a surprisingly low number of researchers turn out toresort to both methodologies. One may then be tempted to conclude that thesetwo fields are either largely redundant or totally antithetic. The startingpoint of this review is that this state of affairs should be put down tocontingent rather than conceptual differences, and that these two fields can infact advantageously be used in a synergistic manner. An overview of both fieldsis first provided, some fundamental concepts of which are illustrated. Avariety of contexts in which complex network theory and data mining have be usedin a synergistic manner are then presented. Contexts in which the appropriateintegration of complex networks metrics can lead to improved classificationrates with respect to classical data mining algorithms and, conversely,contexts in which data mining can be used to tackle important issues in complexnetwork theory applications are illustrated. Finally, ways to achieve a tighterintegration between complex networks and data mining, and open lines ofresearch are discussed.
计算机技术的发展跟不上数据尺寸的急剧增长,这使得从其中提取有效信息的复杂程度越来越高。为了处理这类问题,两种方法在科学界中相继脱颖而出:数据挖掘和复杂网络。这两种方法不但有着相同的目标,即用一个抽象的可量化特征来对复杂系统进行表示,而且这两种方法常常面对的是相同的问题。但是很少有研究者同时使用这两种方法论。一些人倾向于认为这两种方法交叉部分太多,或者认为它们完全对立。这篇综述的出发点是,面对这种状况,我们应该寻求统一,而不是概念上的差异,事实上,这两个领域可以有效协同。我们首先对这两个领域进行综述,先介绍了一些基础概念,然后进一步展示了如何综合使用复杂网络和数据挖掘。具体来说,展示了一些复杂网络的特定度量作为特征可以提高分类问题的准确率,反过来看数据挖掘也可以帮助处理复杂网络中应用问题。最后我们探讨了紧密结合复杂网络和数据挖掘的方法以及与之相关的研究思路。
9、研究多部门(Multi-sector)协调及自组织的一个演化博弈理论方法 (An Evolutionary GameTheoretic Approach to Multi-Sector Coordination and Self-Organization)
原文链接:
http://dx.doi.org/10.3390/e18040152
(Translated by -麟凤兰草-ATC-ABM-Canton)
Coordination games provide ubiquitous interactionparadigms to frame human behavioral features, such as information transmission,conventions and languages as well as socio-economic processes and institutions.By using a dynamical approach, such as Evolutionary Game Theory (EGT), one isable to follow, in detail, the self-organization process by which a populationof individuals coordinates into a given behavior. Real socio-economicscenarios, however, often involve the interaction between multiple co-evolvingsectors, with specific options of their own, that call for generalized and moresophisticated mathematical frameworks. In this paper, we explore a general EGTapproach to deal with coordination dynamics in which individuals from multiplesectors interact. Starting from a two-sector, consumer/producer scenario, weinvestigate the effects of including a third co-evolving sector that we callpublic. We explore the changes in the self-organization process of all sectors,given the feedback that this new sector imparts on the other two.
协调博弈为我们研究人类行为特征,例如信息的交换传输、习俗和语言、社会经济过程和各种社会机构的形成,提供了一种通用的相互作用范式。而通过诸如演化博弈这样的动态方法,我们能够从细节上了解大量个体通过相互协作而形成一种特定行为的自组织过程。
然而,在真实的社会经济情形中,众多具有自己独特观念的部门(Multi-sector)会共同演化、共同发展,这就需要构建更加普适和精妙的数学模型。在本文中,笔者尝试使用一种一般的演化博弈模型来处理协调动力学,其中个体来自多个部门并可以相互作用。从两个部类(消费者/生产者)开始,我们研究公众部门加入之后的影响,公众作为第三个部门也一同演进。在给定这个新加入部门对另外两个部门的影响的反馈之后,我们探究了三个部门自组织进程中的所有变化。
10、作为一种气候信号关键滤波器的河流网络自组织过程 (Self-organization of riverchannels as a critical filter on climate signals)
原文链接:
http://science.sciencemag.org/content/352/6286/694?et_rid=35379714&et_cid=466541
(Translated by Cicely)
Large floods should seemingly influence the depth and width of rivers. Phillips and Jerolmack, however, suggest that the self-organization of bedrock river channels blunts the impact of extreme rainfall events. River channel geometries from a wide range of course-grained rivers across the United States show that larger floods have very limited additional impact on channel geometry. River channel sculpting does increase as flood size increases, but the effect is most pronounced for moderate floods. This relationship may explain the long-term stability of rivers across shifts in climate.
大洪水似乎影响着河流的深度和宽度,而''Phillips''和''Jerolmack''同时也认为基岩河道自组织减弱了极端降水事件的影响。全美大范围的粗粒度河流河道的几何形状显示,即便是更大的洪水对其额外的影响也非常有限。当洪水规模增加时,河道的成形的确也随之增加,而这种影响对于中等强度的洪水表现得最为明显。这一关联或许可以对处在气候变化中的河流的长期稳定性做出解释。
长按识别二维码,关注集智俱乐部,
让我们离科学探索更近一步。
▼原文链接